Curse of the dimension - NISHIO Hirokazu's Scrapbox (Auto-translated from Japanese)

Curse of the dimension

https://gyazo.com/a584556b87390d4fbad2747e235a0017

Humans have difficulty imagining beyond 2-4 dimensions.

A lot of unexpected phenomena happen when you get dimensional.

In higher dimensional space, almost all points are far from the center

A point within distance 1 from the origin in one dimension is half of a point within distance 2

1/4 in 2 dimensions

1/8 in 3D

...and the "percentage of close points" decreases exponentially as the dimension increases.

The number of samples required for sampling increases exponentially.

In the case of machine learning, increasing the dimensionality of the machine deteriorates the accuracy.

Because the effect of insufficient sample size is more overwhelming than the improvement in accuracy due to the additional dimension.

https://gyazo.com/934a40866acc18c6b266fdbb0c8b1ac2

Chi-square distribution - Wikipedia

For 3 or more dimensions, the vector length mode is non-zero.

Condition that each axis follows a standard normal distribution with mode 0

chi-square distribution

This is related to "most points are far from the center."

Almost all vectors are orthogonal

1000000 If you want to find the percentage of samples with a cosine similarity greater than 1/2,

0.06 (about 1/17) in 10 dimensions,

0.01 (about 1/100) in 20 dimensions,

0.0021 (about 1/480) in 30 dimensions,

0.00042 (about 1/2400) in 40 dimensions

100 In 100 dimensions, there were no points in the 10,000,000 sampled points where the cosine similarity was greater than 1/2.

Of course, in two dimensions, 33%.

relevance

https://twitter.com/nishio/status/1258610796969340928?s=21

diversity

If you take two random vectors in a high-dimensional space, the probability that they are nearly the same direction is very small compared to the probability that they are nearly orthogonal

As the number of dimensions (number of evaluation axes) increases, the probability of a state of complete superiority of one person's skills over another's decreases.

100% in 1D, 50% in 2D, 25% in 3D

https://gyazo.com/1b7ed946d22e1cceca40118b9cc7ee6f

In high-dimensional space, the normal distribution is almost uniformly distributed on the hypersphere

Almost every stop is a [saddle point

99.8% in 10 dimensions

There are few cases where only one particular axis is larger than the other.

blind spot card 19

---

This page is auto-translated from /nishio/次元の呪い using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.